Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64

نویسندگان

  • Yanwei Niu
  • Ziang Hu
  • Kenneth E. Barner
  • Guang R. Gao
چکیده

This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to optimize the performance of SVD (Singular Value Decomposition) benchmark. A performance model for dissecting the total execution cycles is presented. The data preloading using “memcpy” or hand optimized “inline” assembly code, and the loop unrolling approach are implemented and compared with each other in terms of the total number of memory access cycles. The key idea is to preload data from offchip to onchip memory and store the data back after the computation. These approaches can reduce the total memory access cycles and can thus improve the benchmark performance significantly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Solve N-Queen Problem with Parallel Genetic Algorithm

Over the past few decades great efforts were made to solve uncertain hybrid optimization problems. The n-Queen problem is one of such problems that many solutions have been proposed for. The traditional methods to solve this problem are exponential in terms of runtime and are not acceptable in terms of space and memory complexity. In this study, parallel genetic algorithms are proposed to solve...

متن کامل

Empirical Study for Optimization of Power-Performance with On-Chip Memory

Recently, power-performance (performance per uniform power consumption) has become a more important factor in modern high-performance microprocessors. In processor design, it is a well-known that off-chip memory access has a large impact on both performance and power consumption. On-chip memory is one solution for this problem, so that many processors such as the Renesas SH-4 and some ARM archi...

متن کامل

Energy Efficient Novel Design of Static Random Access Memory Memory Cell in Quantum-dot Cellular Automata Approach

This paper introduces a peculiar approach of designing Static Random Access Memory (SRAM) memory cell in Quantum-dot Cellular Automata (QCA) technique. The proposed design consists of one 3-input MG, one 5-input MG in addition to a (2×1) Multiplexer block utilizing the loop-based approach. The simulation results reveals the excellence of the proposed design. The proposed SRAM cell achieves 16% ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005